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Software  Complexity  Research  Program 


Department  of  Defense  (DQDl  software  production  and  maintenance  is  a 
large,  poorly  understood,  and  inefficient  process.  Recently  Frost  and  Sullivan 
(The  Military  Software  Market,  1977)  estimated  the  yearly  cost  for  software  within 
DOD  to  be  as  large  as  $9  billion.  DeRoze  (1977)  has  also  estimated  that  115  major 
defense  systems  depend  on  software  for  their  success.  In  an  effort  to  find  near- 
term  solutions  to  software  related  problems,  the  DOD  has  begun  to  support  research 
into  the  software  life-cycle. 

A formal  5 year  R&D  plan  (Carlson  & DeRoze,  1977)  related  to  the  management 
and  control  of  computer  resources  was  recently  written  in  response  to  DOD  Directive 
5000.29.  This  plan  requested  research  leading  to  the  identification  and  validation 
of  metrics  for  software  quality.  The  study  described  in  this  paper  represents  an 
experimental  investigation  of  such  metrics  and  is  part  of  a larger  research  pro- 
gram seeking  to  provide  valuable  information  about  the  psychological  and  human 
resource  aspects  of  the  5 year  plan. 

The  challenge  undertaken  in  this  research  program  is  to  quantify  the  psy- 
chological complexity  of  software.  It  is  important  to  distinguish  clearly  between 
the  psychological  and  computational  complexity  of  software.  Computational  complexity 
refers  to  characteristics  of  algorithms  or  programs  which  make  their  proof  of 
correctness  difficult,  lengthy,  or  impossible  (Rabin,  1977).  For  example,  as  the 
number  of  distinct  paths  through  a program  increases,  the  computational  complexity 
also  increases.  Psychological  complexity  refers  to  those  characteristics  of  software 
which  make  human  understanding  of  software  difficult.  No  simple  relationship  between 
computational  and  psychological  complexity  is  expected.  For  example,  a program  with 
many  control  paths  may  not  be  psychologically  complex,  as  any  regularity  to  the 
branching  process  within  a program  may  be  used  by  a programmer  to  simplify  under- 
standing of  the  program. 
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Halstead  (1977)  has  recently  developed  a theory  concerned  with  the  psycho- 
logical aspects  of  computer  programming.  His  theory  provides  objective  estimates 
of  the  effort  and  time  required  to  generate  a program,  the  effort  required  to 
understand  a program,  and  the  number  of  bugs  in  a particular  program  (Fitzsinmons 
& Love,  1978).  Although  some  predictions  of  the  theory  are  counter-intuitive  and 
contradict  results  of  previous  psychological  research,  the  theory  has  attracted 
attention  because  Independent  tests  of  hypotheses  derived  from  it  have  proven 
amazingly  accurate. 

Although  predictions  of  programner  behavior  have  been  particularly  impressive, 
much  of  the  research  testing  Halstead's  theory  has  been  performed  without  sufficient 
experimental  or  statistical  controls.  Further,  much  of  the  data  was  based  on 
imprecise  estimating  techniques.  Nevertheless,  the  a/ail  able  evidence  has  been 
sufficient  to  justify  a rigorous  evaluation  of  the  theory. 

Rather  than  initiate  a research  program  designed  specifically  to  test  the 
theory  of  software  science,  a research  strategy  was  chosen  which  would  generate 
suggestions  for  improving  prograimer  efficiency  regardless  of  the  success  of  any 
particular  theory.  This  research  focuses  on  four  phases  of  the  software  life- 
cycle:  understanding,  modification,  debugging,  and  construction.  Since  different 
cognitive  processes  are  assumed  to  predominate  in  each  phase,  no  single  experiment 
or  set  of  experiments  on  a particular  phase  would  provide  sufficient  basis  for 
making  broad  recommendations  for  improving  programner  efficiency.  Each  experiment 
in  the  series  comprising  this  research  program  has  been  designed  to  test  important 
variables  assumed  to  affect  a particular  phase  of  software  development.  Professional 
progranmers  will  be  used  in  these  experiments  to  provide  the  greatest  possible 
external  validity  for  the  results  (Campbell  & Stanley,  1973).  In  addition,  Halstead's 
theory  of  software  science  and  other  related  metrics  can  be  evaluated  with  these 
data. 
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Predicting  Progranmers ' Ability  to  Modify  Software 

Currently,  computer  prograinrjers  spend  more  of  their  time  modifying  existing 
software  or  converting  it  to  operate  in  new  environments  than  in  developing  new 
software.  Although  modifications  and  maintenance  costs  have  been  estimated  to 
be  three  times  higher  than  those  associated  with  development,  managers  continue 
to  invest  their  resources  in  modification  in  the  mistaken  belief  that  development 
costs  and  risks  are  prohibitive.  Thus,  modified  systems  frequently  become 
inefficient  collections  of  concatenated  patches.  Decisions  to  modify  programs 
would  be  aided  by  systematic  information  estimating  the  cost,  time,  and  resources 
necessary  to  complete  a particular  modification.  This  study  was  designed  to 
determine  the  characteristics  of  software  and  requested  modifications  which  are 
related  to  the  speed  and  accuracy  with  which  programmers  are  able  to  make  modifica- 
tions. 

Several  progranming  practices  should  influence  the  ease  with  which  a program 
can  be  modified.  Among  these  practices  are  documentation  and  the  use  of  structured 
coding  techniques.  Dijkstra  (1972)  argued  that  program  construction  should  proceed 
in  a top-down  structured  fashion,  and  that  programs  consistent  with  these  guidelines 
would  be  easier  to  understand,  debug,  and  modify.  In  an  experiment  using  student 
programmers,  Lucas  and  Kaplan  (1974)  found  that  structured  programs  took  less  time 
to  modify.  Sheppard,  Borst,  & Love  (1978)  found  that  programs  which  were 
structured  in  a manner  to  compensate  for  the  lack  of  suitable  control  structures  in 
FORTRAN  were  more  easily  comprehended  by  professional  programmers. 

The  use  of  comnents,  in-line,  global,  or  both,  is  another  standard  software 
engineering  practice  which  is  thought  to  be  related  to  ease  of  modification, 
although  there  is  some  contention  over  how  the  documentation  should  be  implemented. 

Global  comments  preceding  a program  indicate  what  objective  will  be  accomplished. 

In-line  comments  delineate  exactly  how  and  where  the  objective  is  fulfilled. 

Use  of  in-line  comments  has  been  encouraged  to  simplify  the  process  of  making  changes 
to  programs  (Wilkes,  Wheeler,  & Gill,  1951,  Poole,  1973).  Others  (Musa,  1976;Shneiderman, 
1977)  have  found  that  global  comments  improved  student  programmers'  ability  to 
comprehend  and  modify  programs,  but  contend  that  in-line  comnents  seemed  distracting. 

For  example,  in  a FORTRAN  modification  task  with  student  programmers,  Yasukawa  (1974) 
found  that  a group  given  global  comments  performed  better  than  a group  given 
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detailed  comnents.  However,  Newsted  (1974)  found  that  on  short  FORTRAN  programs, 
comnents  preceding  the  code  defining  the  variables  were  not  useful.  Still  other 
computer  scientists  recommend  both  global  and  in-line  cotiments  on  the  theory  that 
too  much  documentation  is  inpossible. 

In  parallel  with  these  attempts  to  improve  programmer  efficiency,  several 
approaches  have  been  developed  for  predicting  the  psychological  complexity  of  soft- 
ware. Presumably,  techniques  which  improve  the  efficiency  of  programmers'  perfor- 
mance do  so  by  simplifying  the  cognitive  task  facing  them.  Thus,  complexity  metrics 
are  one  way  of  measuring  and  validating  this  assumption.  Where  such  validation  is 
successful  these  metrics  may  indicate  guidelines  for  program  development. 

In  1972,  Halstead  first  published  his  theory  of  software  physics  (renamed 
software  science)  stating  that  algorithms  have  measurable  characteristics  analogous 
to  physical  laws.  According  to  Halstead  (1972a,  1972b,  1975,  1977),  the  amount  of 
effort  (£)  required  to  generate  a program  can  be  calculated  from  simple  counts  of  the 
actual  code.  The  calculations  are  based  on  four  quantities  from  which  Halstead 
derives  the  number  of  mental  comparisons  required  to  generate  a program;  the  number 
of  distinct  operators  and  operands  and  the  total  number  of  occurrences  of  operators 
and  operands.  Preliminary  tests  of  the  theory  reported  very  high  correlations 
(some  greater  than  .90)  between  Halstead's  metric  and  such  dependent  measures  as 
the  number  of  bugs  in  a program  (Cornell  & Halstead,  1976;  Funami  & Halstead,  1975), 
programming  time  (Gordon  & Halstead,  1975),  and  the  quality  of  programs  (Bulut  & 
Halstead,  1974;  Elshoff,  1976;  Gordon,  1977;  Halstead,  1973).  A more  recent  test  by 
Sheppard,  Borst,  & Love  (1978)  indicated  that  the  relationship  between  Halstead's 
measure  and  program  comprehensibility  could  be  affected  by  differences  among  the 
programs  studied. 

McCabe  (1976)  developed  a definition  of  complexity  based  on  the  decision  struc- 
ture of  the  program.  McCabe's  complexity  metric,  V(G) , is  the  classical  graph- 
theory  cyclomatic  number  defined  as: 

V(G)  = # edges  - # nodes  + # connected  components . 

Simply  stated,  McCabe  counts  the  number  of  elementary  control  paths  through  a computer 
program. 

The  present  study  experimentally  evaluated  the  effects  of  two  programming  prac- 
tices (i.e.,  well  structured  code  and  use  of  comments)  on  the  ease  with  which  a pro- 
gram could  be  modified.  In  addition,  there  was  an  assessment  of  the  relationship 
between  the  spped  and  accuracy  of  making  a modification  and  three  software  complexity 
metrics, namely  Halstead's  E,  McCabe's  ^),  and  the  total  number  of  statements. 
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Participants 

The  sample  for  this  experiment  consisted  of  36  professional  programmers  from 
three  different  locations  within  the  General  Electric  Company.  These  participants 
had  an  avtrage  of  5.9  yerrs  of  programming,  experience  (512.  = 4.1),  and  all  had  a 
working  knowledge  of  FORTRAN.  Twenty  of  these  progranmers  had  an  engineering 
background,  while  the  remaining  16  had  diverse  backgrounds  often  including  statis- 
tical and  non-numeric  prograrnning. 

Procedure 

A packet  of  materials  was  prepared  for  each  participant.  The  initial  instruc- 
tions to  each  participant  are  presented  in  Appendix  A.  In  a preliminary  exercise, 
participants  were  asked  to  modify  a short  FORTRAN  program.  All  36  participants 
were  given  the  same  preliminary  program  and  a brief  description  of  its  purpose. 
Participants  were  given  unlimited  time  to  complete  the  modification.  The  purpose 
of  this  introductory  program  was  two- fold: 

1)  to  provide  a common  basis  for  comparing  the  skills  of  the  participants 
on  this  type  of  task,  and 

2)  to  control  for  initial  learning  effects. 

Following  this  initial  exercise,  participants  were  presented  in  turn  with  the 
three  programs  which  comprised  the  experimental  task.  One  modification  was  requested 
for  each  program,  and  participants  were  allowed  to  work  at  their  own  pace,  taking 
as  much  time  as  needed  to  implement  the  modification.  An  electronic  timer  was 
used  to  record  the  beginning  and  ending  times  of  each  trial  to  the  nearest  minute. 

Experimental  Design 

In  order  to  control  for  the  individual  differences  in  performance. a within  — 

* * 

subjects  3^  factorial  design  was  employed  (Hahn  and  Shapiro,  1966).  Three  of  the 
programs  from  a previous  experiment  in  this  research  program  (Sheppard,  Borst, 

& Love,  1978)  were  used.  Three  levels  of  control  flow  were  defined  for 

each  of  the  three  programs,  and  each  of  these  nine  versions  was  presented  with  three 

types  of  documentation  for  a total  of  27  programs.  Modifications  at  three  levels 

of  difficulty  were  developed  for  each  program  generating  a total  of  81  experimental 

conditions. 
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Four  sets  of  nine  participants  were  used  in  the  experiment.  The  participants 
in  the  first  three  sets  exhausted  the  total  of  27  program-modification  combinations. 
The  fourth  set  of  9 participants  repeated  the  assignments  of  one  of  the  three 
previous  sets.  Table  1 shows  the  design  for  the  first  27  participants. 

Programmers  at  each  location  were  randomly  assigned  to  the  design  so  that  over 
the  course  of  their  three  experimental  programs  every  participant  had  seen  each 
program,  each  level  of  modification,  each  type  of  documentation,  and  each  type 
of  control  flow.  For  simplicity  the  design  is  presented  in  Table  1 without  regard 
to  the  order  of  presentation  to  the  participants.  One  of  the  six  possible  orders  of 
presentation  of  the  three  programs  was  assigned  randomly  and  without  replacement 
to  each  participant. 

Independent  Variables 

Programs . Three  programs  were  selected  from  among  those  employed  in  a previous 
study  from  this  research  program  (Sheppard,  Borst  & Love,  1978).  The  programs 
were  considered  to  be  representative  of  programs  actually  encountered  by  professional 
programners.  All  versions  of  these  three  programs  were  compiled  and  executed  using 
appropriate  test  data.  The  programs  were  all  written  in  standard  FORTRAN. 

Complexity  of  control  flow.  Three  levels  of  control  flow  complexity  were  de- 
fined for  each  program.  The  least  complex  level  adhered  strictly  to  the  tenets  of 
modem  structured  programming  (Dijkstra,  1972).  Program  flow  proceeded  from  top  to 
bottom  with  one  entry  and  one  exit.  Neither  backward  transfer  of  control  nor  arith- 
metic IFs  were  allowed. 

In  FORTRAN  awkward  constructions  often  occur  when  structured  programming  prac- 
tices are  applied  rigorously,  such  as  DO  loops  with  dummy  variables  (Tenny,  1974). 
These  awkward  constructions  were  largely  eliminated  in  the  moderate  or  quasi- 
structured  level  where  a more  natural  control  flow  was  allowed.  A judicious  use 
of  backward  GO  TO  statements  and  multiple  exits  was  permitted.  IF  statements  were 
again  restricted  to  assignment  and  logical  IF's. 

In  the  most  complex  (i.e.,  unstructured)  versions  of  each  program  the  control 
flow  was  not  straightforward.  GO  TO  statements  occurred  frequently,  and  backward 
transfer  of  control  was  not  restricted.  The  three-way  transfer  of  control  statement 
(arithmetic  IF)  was  allowed  only  at  this  level  (Appendix  B). 

Comments.  Three  types  of  comments  were  tested  in  this  experiment:  global, 
in-line,  and  none.  G-Tobal  comments  provided  an  overview  of  the  function  of  the  pro- 
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EXPERIMENTAL  DESIGN 


Each  cell  represents  one  of  the  three  assignments  of  a participant  within  the 
BLOCK  OF  27  participants 


gram  and  Identified  the  primary  variables.  In-line  comments  were  interspersed 
throughout  the  program  and  described  the  specific  function  of  small  sections  of 
code.  Examples  are  presented  in  Appendix  C. 

Modifications.  Three  types  of  modifications  were  selected  for  each  program 
as  representative  of  the  tasks  a programmer  might  be  expected  to  encounter  in 
relation  to  such  programs.  The  level  of  difficulty  for  seven  of  the  nine  modifications 
increased  as  the  number  of  lines  which  had  to  be  added  to  the  code  increased.  In 
every  case,  the  hardest  modification  for  each  program  was  the  one  which  required  the 
most  lines  of  code  to  be  inserted  into  the  original  program. 

Co variates 

I 

In  order  to  obtain  a measure  which  was  assumed  to  be  related  to  programming 
ability,  all  participants  were  required  to  perform  the  same  preliminary  task.  A 
short  program  was  given  to  the  participants  to  modify.  Thei r scores  on  this  task 
were  used  as  a covariate  to  measure  individual  performance  differences.  Participants 
were  asked  their  type  of  progranming  experience,  the  number  of  years  they  had  been 
prograirming  professionally,  and  the  size  of  the  largest  program  they  had  ever 
constructed.  Order  of  presentation  was  measured  as  a situational  covariate. 

Complexity  Metrics 

Halstead's  E.  Halstead's  E_ metric  was  computed  from  a program  (based) 
on  Ottenstein,  1976)  which  had  as  input  the  source  code  listings  of  nine  programs 
(three  separate  programs  at  each  of  three  levels  of  complexity).  The  computational 
formula  was: 


(N^  + Ng)  logg  (n^  + n^) 
(2/n^)  (ng/Ng) 


where. 


^ 

=*  number  of  unique  operators 
ng  * number  of  unique  operands 

= total  number  of  occurrences  of  operators 
Ng  = total  number  of  occurrences  of  operands 

McCabe's  V(6).  McCabe's  metric  is  the  classical  graph-theory  cyclomatic 
number,  defined  as  VfGl  = # edges  - f nodes  + # connected  components  Because  the 
• McCabe  measure  is  defined  only  for  programs  that  adhere  strictly  to  the  rules  of 
structured  programming,  some  modifications  to  the  metric  were  necessary  in  order  to 
evaluate  the  less  structured  control  flow  versions. 

In  the  simplest  program  possible,  ylG)  = 1;  sequences  do  not  add  to  the 
complexity.  IF-THEN-ELSE  is  valued  as  2,  increasing  the  complexity  by  1.  A DO 
or  DO  WHILE  is  also  2,  the  assumption  being  that  there  are  really  only  two  control 
paths,  the  straight  path  through  the  DO  and  the  return  to  the  top,  regardless  of 
' the  number  of  times  executed.  Clearly  a DO  executed  25  times  is  not  25  times  more 

complex  than  a 00  executed  once. 

In  order  to  compute  the  metric  for  unstructured  programs,  several  alterations 
were  made.  An  additional  RETURN  was  counted  as  an  extra  path  in  each  case,  keeping 
the  cyclomatic  number  the  same  as  that  of  a "GO  TO  end".  For  statements  of  the 
form: 

IF  ( ) 100  , 200  , 300 

the  complexity  was  increased  by  2 as  opposed  to  the  logical  IF,  which  increases 
the  complexity  by  1.  These  are  small  changes  which  appear  to  be  reasonable  extensions 
of  McCabe's  theory.  However,  one  difficulty  arises  with  the  arithmetic  IF  when  two 
paths  are  the  same: 

IF  ( ) 100,  100,  200. 

In  order  to  standardize  the  procedure,  it  was  counted  in  the  same  way  as  the  standard 

i arithmetic  IF,  with  2 added  to  the  Y(G)  metric. 

1 

All  experimental  programs  were  checked  before  the  experiment  to  Insure  that 
the  most  complex  version  of  the  program  had  the  highest  McCabe  value  and  the  least 
complex  version  had  the  lowest  value. 
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taken  by  the  participant  to  perform  the  task.  The  individual  steps  necessary  for 
correct  implementation  of  the  requested  modifications  had  been  delineated  in  advance 
and  assigned  equal  weights.  The  participants'  changes  were  then  compared  to  the 
cn'teria.  Thus,  a percentage  score  reflecting  the  correctness  of  each  modification 
was  achieved.  All  of  the  responses  were  scored  by  the  same  grader.  The  time  to 
write  a modification  was  measured  to  the  nearest  minute. 


The  analyses  of  results  was  conducted  in  two  phases.  The  first  phase  was  an 
experimental  test  of  the  programming  practices,  while  the  second  phase  was  a evalua- 
tion of  the  software  complexity  metrics. 

The  first  phase,  involving  experimental  manipulations  of  programming  practices, 
was  analyzed  by  a hierarchical  regression  analysis.  In  this  analysis  domains 
of  variables  were  entered  sequentially  into  a multiple  regression  equation  to  deter- 
mine if  each  successive  domain  significantly  improved  the  prediction  of  the  equation 
developed  from  domains  already  entered.  Thus,  the  order  with  which  domains  were 
entered  into  the  analysis  was  important.  In  this  study,  effects  related  to  differences 
among  participants,  programs,  modifications, and  order  were  entered  into  the  analysis 
prior  to  evaluating  the  effects  of  programming  practices.  The  variable  domains  were 

entered  in  the  following  order: 

Differences  related  to  participants  and  programs 


1 ) Pretest  scores 

2)  Order  of  presentation 

3)  Specific  program 

4)  Modification 

Programning  practices 

5)  Program  structure 

6)  Documentation 

The  variables  representing  the  different  conditions  of  domains  three  tljrough  ,slx  were 
effect  coded  (Kerlinger  & Pedhazur,  1973). 
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The  second  phase  of  analysis  Investigated  relationships  among  Halstead's 
McCabe's  USX*  number  of  statements  in  the  program,  and  the  time  and  score  on 
the  experimental  task.  Correlations  among  these  measures  were  examined  in  both 
the  modified  and  unmodified  programs. 
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Results 


Across  all  experimental  conditions  an  average  score  of  62%  was  received  on 
modifications  made  (50^31%).  The  108  accuracy  scores  ranged  In  value  from  five 
scores  of  0 to  24  scores  of  100;  they  were  negatively  skewed.  The  average  time 
to  complete  the  modifications  was  17.9  minutes  (^=11 .4), ranging  from  a low  of  2- 

minutes  to  a high  of  59  minutes.  The  time  data  were  positively  skewed.  Score 
and  time  were  uncorrelated. 

Pretest 

Means  and  standard  deviations  for  the  pretest  accuracy  score  (W=66%,  SD=30%) 
and  time  to  conpletion  {Mf21.4  min,  S£=14.6)  were  similar  to  those  observed  on 
the  experimental  tasks.  Score  and  time  for  the  pretest  were  correlated  -.44  (df=34, 

^ .005)  indicating  that  participants  with  high  scores  worked  more  quickly;  but  no 
causal  interpretation  is  implied.  Pretest  performance  was  modestly  related  to  exper- 
ience in  that  the  number  of  statements  in  the  largest  program  a participant  had  ever 
written  was  related  to  the  score  (r^24)“*32,  .05),  while  participants  with  more 

years  of  experience  were  able  to  complete  their  modifications  more  quickly 
.35,  £i.025).  With  the  exception  of  pretest  score,  none  of  the  individual  difference 
variables  were  related  to  the  dependent  variables  on  the  experimental  tasks. 


Accuracy  and  Completeness  of  Modification 

Results  presented  in  Table  2 indicate  that  overall, only  19%  of  the  variance 
in  scores  on  the  modifications  could  be  predicted  by  the  variable  domains  measured 
here.  However,  there  were  substantial  differences  in  the  degree  to  which  performance 
on  the  three  programs  could  be  predicted.  Performance  on  two  of  the  programs 
was  reasonably  predictable;  half  of  the  variance  was  accounted  for  in 
the  separate  results  for  each  program,  and  35%  was  accounted  for  in  the  combined 
results  for  both  programs.  However,  the  results  for  a third.4)rogram  were  insig- 
nificant. 

Modest  relationships  with  the  performance  score  were  observed  for  both  the 
pretest  and  order  of  presentation.  The  significance  of  the  order  variable  suggests 
the  presence  of  learning  or  practice  effects.  However,  this  interpretation  is 
confounded  by  the  fact  that  random  assignment  of  presentation  order  failed  to  counter 
balance  the  number  of  times  each  condition  appeared  in  each  position  order.  With 
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TABLE  2 


Hierarchical  Regres$ion  Analyses  for  Accuracy  of  Modification 


Variable  domain 

A 

All  programs 
(n=108) 

Two  most  predictable 
programs  (ii=72) 

1) 

Pretest  score 

.05* 

.05 

2) 

Order  of  presentation 

.05* 

.13** 

3) 

Specific  program 

.02 

.01 

4) 

Modification  difficulty 

.02 

.09** 

5) 

Control  flow  complexity 

.04 

.07** 

6) 

Comment  type 

.01 

.00 

All 

domai ns 

.19 

.35*** 

Note;  Figures  indicate  the  precent  of  variance  contributed  to  prediction 
of  performance  in  addition  to  that  afforded  by  preceding  domains. 
Significance  levels  indicate  whether  this  represented  a significant 
contribution  to  prediction. 


each  succeeding  experimental  task,  participants  made  more  complete  modifications 
in  less  time.  However,  the  two  programs  on  which  performance  proved  most  predictable 
were  presented  to  participants  more  frequently  in  the  second  or  third  order  position. 

Accuracy  scores  differed  as  a function  of  the  difficulty  of  the  modification  on 
the  two  most  predi eatable  programs.  As  expected,  performance  was  not  as  good  on  modi- 
fications which  required  more  lines  of  code  to  be  inserted.  The  complexity  of  the 
control  flow  also  affected  accuracy  scores  on  the  two  programs  for  which  accuracy 
was  most  predictable;  modifications  to  the  structured  programs  were  more  accurate 
and  complete  than  those  made  to  unstructured  programs. 

Accuracy  scores  did  not  differ  as  a function  of  differences  among  programs. 
However,  differences  among  programs  moderated  relationships  among  other  independent 
variables  and  the  accuracy  criterion.  While  mean  accuracy  scores  did  not  differ 
significantly  across  programs,  relationships  between  accuracy  and  other  variables 
did  differ  among  these  programs.  No  differences  in  scores  were  observed  as  a 
function  of  the  type  of  comments  included  in  the  program. 

Time  to  Complete  Modifications 

Data  presented  in  Table  3 indicate  that  28%  of  the  variance  in  the  time 
required  to  complete  the  modifications  across  all  three  programs  could  be  accounted 
for  by  variables  studied  here.  The  time  to  complete  the  modification  was  more  easily 
predicted  than  the  accuracy  of  the  modification  on  the  program  for  which  prediction  of 
accuracy  was  low.  Although  time  to  completion  was  not  as  highly  predicted  on  this 
program  as  it  was  on  the  other  two,  including  data  from  it  in  the  regression  analysis 
did  not  lower  the  percents  of  variance  accounted  for  to  the  extent  that  had  been 
observed  in  the  accuracy  analysis. 

Results  of  the  hierarchical  regression  for  time  were  similar  to  the  results 
which  had  been  observed  for  accuracy.  The  specific  program  and  type  of  comments 
were  unrelated  to  the  criterion.  Unlike  the  earlier  analysis  for  accuracy  scores, 
however,  the  pretest  results  were  not  related  to  time  to  complete  the  modification. 
Significant  effects  were  observed  for  difficulty  of  the  modification  and  order  of 
presentation,  although  again, the  interpretation  of  the  effect  for  this  latter 
variable  is  confounded.  Although  control  flow  complexity  was  significantly  related 
to  the  accuracy  of  the  modification  on  the  two  programs  on  which  accuracy  was  most 
predictable,  no  such  effect  was  observed  for  the  time  to  complete  the  modification. 
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TABLE  3 


Hierarchical  Regression  Analyses  for  Modification  Time 


A 

Variable  domain 

All  programs 
(n=108) 

Two  most  predi eatable 
programs  (nf72) 

1)  Pretest  time 

.03 

.00 

2)  Order  or  presentation 

.06** 

.06 

3)  Specific  program 

.01 

.01 

4)  Modification  difficulty 

.15** 

.29** 

5)  Control  flow  complexity 

.02 

.00 

6)  Comment  type 

.01 

.01 

All  domains 

.28*** 

.37*** 

Note:  Figures  indicate  the  percent  of  variance  contributed  to  prediction 
of  performance  in  addition  to  that  afforded  by  preceding  domains. 
Significance  levels  indicate  whether  this  represented  a signifi- 
cant contribution  to  prediction. 


.01 

.001 
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A post  hoc  inspection  of  the  nine  individual  modifications  in  this  experiment 
verified  that  the  number  of  new  statements  to  be  inserted  into  the  code  was 
related  to  the  time  required  to  make  the  modification.  Fitting  a hyperbolic  function 
to  these  data  using  least  squares  procedures  (Figure  1)  resulted  in  an  r,  of  .80  and 
a standard  error  of  estimate  of  2.53.  No  such  relationship  was  found  for  score. 

Software  Complexity  Metrics 

Relationships  among  metrics.  Correlations  among  Halstead's  and  McCabe's  metrics 
and  the  length  (number  of  statements)  are  presented  in  Table  4 for  the  original 
programs  and  their  modified  versions.  There  were  nine  different  versions  of  the 
original  programs  (three  programs  each  with  three  versions  of  control  flow)  and 
27  modified  versions  representing  three  different  modifications  to  each  original  pro- 
gram. Correlations  among  these  measures  were  quite  high  on  both  the  original  and 
modified  programs,  especially  between  length  and  Halstead's  E. 

Relationships  with  criteria.  Correlations  between  the  three  complexity  metrics 
and  the  two  dependent  variables  are  shown  in  Table  5 for  individual  datapoints  (n^lOS) 
and  data  aggregated  across  the  27  modified  programs.  In  each  case,  the  correlations 
on  the  aggregated  data  were  numerically  larger  than  those  in  the  unaggregated  data. 
These  larger  correlations  result  from  the  elimination  of  individual  differences  and 
other  sources  of  error  through  the  aggregation  process.  The  strongest  relationship 
on  the  original  programs  was  a tendency  for  higher  McCabe  values  to  be  associated 
with  lower  accuracy  scores,  but  the  largest  number  of  significant  relationships  were 
observed  in  relationship  to  the  modified  programs.  While  McCabe's  V(G)  continued  to 
demonstrate  the  largest  relationship  with  score,  both  the  length  and  Halstead's  £ 
metrics  demonstrated  moderate  correlations  with  the  time  to  complete  the  modification. 

Correlations  between  complexity  metrics  and  performance  measures  were  found  to 
differ  with  different  types  of  comments.  Table  6 presents  the  correlations  between 
complexity  metrics  and  performance  measures  for  the  data  generated  with  each  type 
of  comment.  All  but  one  of  the  significant  correlations  observed  occurred  when  no 
comments  were  included  in  the  program.  These  correlations  were  stronger  on  the 
modified  programs  than  on  the  original  programs. 
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Additional  Statements  Required  Versus  Time 


TABLE  4 


Correlations  among  Measures  of  Software  Complexity 


Correlations 


Metric 

Length 

V(G) 

Original  programs  (ii=9) 

McCabe’s 

.80** 

Halstead's  E 

.97*** 

.77** 

Modified  programs  (n_=27) 

McCabe's 

.83*** 

Halstead's  E 

.90*** 

. 77*** 

TABLE  5 


I 


Correlations  between  Complexity  Metrics  and 
Performance  Measures  for  Aggregated 
and  Unaggregated  Data 


Correlations 

Original  Program 

Modified  Program 

Criterion 

Length 

mi  p 

Length  V(G)  E 

Accuracy  score 

Unaggregated  (nflOB) 
Aggregated  (£=27) 

-.17* 

-.28 

-.22**  -.12 

-.38*  -.21 

-.20*  -.22*  -.17* 

-.37*  -.46**-. 29 

Time  to  completion 
Unaggregated  (£=108) 
Aggregated  (£=27) 

.13 

.20 

.14  .16* 

.22  .25 

.30***. 23**  .28** 

.45**  .34*  .44** 

*2^.05 

***£^.001 


17 


TABLE  6 


Correlations  betvfeen  Complexity  Metrics  and 
Performance  Measures  under  Different 
Types  of  Commenting 


Correlations 

Original  Program  Modified  Program 

Criterion  and  Length  V(G)  E_  Length  V(6)  E. 

Type  of  commenting 


Accuracy  score 


In  line 

-.03 

-.09 

.01 

.03  .01 

.03 

Global 

-.14 

-.22 

-.06 

-.23  -.23* 

-.18 

None 

-.31* 

-.34* 

-.28* 

-.37*  -.34* 

-.34* 

Time  to  completion 

In  line 

.14 

.09 

.16 

.16  .07 

.16 

Global 

.02 

.13 

.09 

.18  .21 

.21 

None 

.26 

.21 

.26 

.55***  .42** 

.47** 

Discussion 


Three  aspects  involved  in  the  programmers'  task  of  modifying  software  are: 

1)  characteristics  of  the  code  to  be  modified,  2)  characteristics  of  the  requested 
modification,  and  3)  characteristics  of  the  programmer.  The  main  factors  found 
to  influence  programmers'  ability  to  correctly  modify  programs  were  the  difficulty 
of  the  requested  modification  and  the  order  of  presentation.  Other  influences 
were  the  complexity  of  the  control  flow  of  the  original  program  and  individual 
differences  among  programmers  as  measured  by  a pretest.  Each  of  these  factors 
contributed  separately  to  the  prediction  of  the  performance  on  the  task  studied. 
Contrary  to  expectations,  documentation  did  not  influence  performance.  Several 
metrics  of  software  complexity  were,  however,  helpful  in  predicting  the  accuracy 
of  a modification  and  the  time  required  to  complete  it. 

It  is  not  surprising  that  differences  in  the  difficulty  of  the  modifications 
were  related  to  the  time  taken  to  implement  the  modifications.  The  effect  was 
more  pronounced  for  time  than  for  accuracy.  The  number  of  new  lines  to  be  added 
was  the  significant  criterion  for  explaining  the  time  spent  to  finish  a task,  rather 
than  the  number  of  in-line  changes,  such  as  deletions  or  substitutions  of  operators  and 
operands.  In  general,  the  more  new  lines  to  be  created,  the  longer  the  time  expended. 

The  data  reported  here  suggest  that  the  difficulty  of  a modification  affects 
the  accuracy  with  which  it  is  implemented.  This  result  agrees  with  a previous 
study  by  Boies  and  Gould  (1974)  concerned  with  syntactic  errors.  They  monitored 
editor  commands  which  either  inserted  or  substituted  code  in  programs  at  a large 
research  center.  Programs  with  syntactic  errors  averaged  32  inserted  new  lines, 
while  programs  without  syntactic  errors  averaged  only  three  inserted  lines.  There 
was  no  difference  in  the  number  of  substitutions  to  existing  code.  These  findings 
probably  indicate  a greater  cognitive  difficulty  in  creating  code  than  in  merely 
deleting  or  adapting  it. 

A significant  effect  due  to  the  order  of  presentation  of  the  programs  suggests 
the  existence  of  a learning  effect  as  the  progrartiners  progressed  from  task  to  task. 

Such  effects  were  not  observed  in  a previous  experiment  (Sheppard,  Borst,  & 

Love,  1978)  which  involved  understanding,  as  opposed  to  modifying  programs.  The 
failure  of  random  assignment  of  presentation  order  to  counterbalance  the  effects  of 
program  differences  does  not  permit  a clear  interpretation  of  the  learning  effect  . 

Control  flow  complexity  was  marginally  related  to  the  accuracy  of  the  modifi- 


19 


r 


\ 


I 

i 


cation,  but  not  to  the  time  spent  making  it.  This  effect  only  occurred  in  the 
two  programs where  performance  was  most  easily  predicted.  Structured  code  tended 
to  produce  more  accurate  modifications. 

It  was  anticipated  that  the  inclusion  of  documentation,  either  global  or 
in-line  coments,  would  significantly  improve  performance  on  a modification  task. 

No  such  improvement  occurred  on  the  programs  used  in  this  experiment.  This  is 
counterintuitive;  however,  it  concurs  with  the  lack  ofa  significant  effect  from 
our  previous  study  (Sheppard,  Borst,  & Love,  1978)  for  a related  cognitive  programming 
aid,  mnemonic  variable  names.  This  lack  of  effect  for  cognitive  programming  aids 
may  have  occurred  for  one  of  two  reasons.  First,  in  the  experiment  where  levels 
of  variable  mnemonicity  were  manipulated,  global  comments  were  provided  with  all 
programs.  In  the  current  experiment  where  types  of  comments  were  manipulated, 
mnemonic  variable  names  were  provided  across  all  types  of  comments  in  all  programs. 
Thus,  the  existence  of  one  type  of  cognitive  programming  aid  may  have  reduced  the 
additional  information  available  from  the  type  of  aid  being  experimentally  manipulated, 
reducing  its  impact  on  performance. 

A second  possibility  is  that  these  cognitive  programming  aids  do  not  contri- 
bute significantly  to  performance  for  programs  of  the  modular  size  (approximately 
50  lines)  employed  here.  In  large  systems  with  many  modules  and  the  thousands 
of  lines  of  code  cognitive  programming  aids  may  have  more  impact  on  performance 
because  of  the  increased  amount  of  information  to  be  processed.  Thus,  it  may  be 
that  program  size  moderates  the  relationship  between  cognitive  programming  aids 
such  as  documentation  or  mnemonic  variable  names  and  performance  on  various  programming 
tasks. 


As  expected  from  previous  work  (Sheppard,  Borst,  & Love,  1978),  this  experi- 
ment showed  extremely  high  correlations  among  the  metrics  used;  length  of  the 
program,  Halstead's  £,  and  McCabe's  V(G).  Since  Halstead's  theory  of  software 
science  applies  primarily  to  programs  in  final  form  rather  than  programs  under 
development,  both  the  original  and  modified  programs  were  examined  for  correlations 
with  the  complexity  metrics.  In  every  case,  there  was  a higher  correlation  with 
performance  for  complexity  metrics  computed  on  the  modified  programs  than 
had  been  observed  for  those  computed  on  the  original  programs.  Nevertheless,  the 
correlations  observed  are  not  as  high  as  those  reported  by  Halstead  (1977)  in  other 
verifications  of  his  theor7.  The  size  of  the  programs  employed  here  may  have  been 
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a limiting  factor  in  the  results  obtained.  The  range  of  values  for  the  complexity 
metrics  may  not  have  been  sufficient  to  allow  correlational  tests  to  detect  the 
strength  of  relationships  that  have  been  reported  in  other  contexts  (Fitzsimmons  & 
Love,  1978).  When  used  with  small  programs,  the  metrics  were  equally  productive; 
when  used  in  a larger  system,  it  may  be  that  one  of  the  metrics  will  prove  superior. 

Relating  the  metrics  to  performance  under  conditions  distinguished  by  type 
of  comments  uncovered  the  interesting  result  that  the  metrics  appeared  to  predict 
performance  better  when  there  was  no  documentation  presented  than  when  either  in- 
line or  global  comments  appeared.  Since  the  metrics  are  based  on  the  code,  comnents 
of  either  type  add  information  which  the  metrics  do  not  evaluate.  In  this  experi- 
ment that  additional  or  possibly  redundant  information  neither  improved  nor  hindered 
the  prograimer's  performance  in  any  way.  This  suggests  that  comments  are  not 
ignored  but  that  their  effect  on  the  programming  tasks  may  be  more  complex  than 
can  be  explained  by  simple  effects  on  performance. 

Differences  among  programs  played  an  important  role  in  this  experiment,  as 
they  had  done  in  a previous  study  (Sheppard,  Borst,  & Love,  1978).  The  complexity 
metrics  provided  one  source  of  information  about  program  differences,  but  there 
were  other  factors  within  the  programs  which  were  not  assessed  by  these  metrics. 

Such  factors  may  involve  the  complexity  of  the  task  performed  by  the  program. 

Thus,  the  cognitive  difficulty  of  the  program  to  the  prograimer  may  involve  an 
interaction  between  program  characteristics  and  individual  differences,  such  as  the 
progranmer's  experience  with  similar  programs.  Further  research  on  complexity 
metrics  should  evaluate  ways  of  assessing  the  complexity  of  the  task  performed 
by  the  program,  in  addition  to  the  factors  currently  assessed  by  the  Halstead  and 
McCabe  metrics. 
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GOOD  MORNINGl ! i 


APPENDIX  A 

INSTRUCTIONS  TO  PARTICIPANT 


Today  we  are  going  to  ask  you  to  participate  in  an  experiment  wfiicfi  we 
hope  will  be  both  entertaining  and  challenging. 

This  work,  sponsored  by  the  Office  of  Naval  Research,  is  being  done  to 
make  computer  programs  easier  to  modify.  To  do  this,  we  will  give  you  three 
separate  programs  and  modification  slips  and  ask  you  to  make  the  required  modification 
to  each  program. 

Our  purpose  is  to  evaluate  characteristics  of  programs  which  make  them  easier 
to  modify.  It  is  not  to  evaluate  computer  programmers.  Your  performance  on  a program 
will  be  compared  only  to  your  performance  on  other  programs.  Your  only  competition 
is  yourself.  All  programs  and  papers  that  you  will  be  handed  are  carefully  numbered 
so  it  is  not  necessary  for  you  to  put  your  name  on  any  of  these. 

We  would  like  you  to  answer  the  following  questions  for  our  research  purposes; 

1.  How  long  have  you  been  progranming  in  FORTRAN  professionally? 

years  months 

2.  Please  circle  one  of  the  following:  Has your  primary  experience  been  with 
Engineering,  Statistical, Non-Numeric,  or  Other  programs? 

Also,  please  briefly  describe  your  specific  areas  of  programming  experience. 


3..  Approximately  how  many  source  code  instructions  were  in  the  longest  FORTRAN 
program  that  you  have  ever  written?  Please  exclude  blank  lines  and  comments 


During  this  experiment,  each  of  you  will  be  working  on  a different  program. 

If  someone  else  seems  to  finish  earlier  than  you,  don't  be  concerned.  They  will  have 
been  working  on  something  else  entirely  which  might  not  require  as  much  time. 

We  will  begin  this  morning  with  a short  test  program.  We  will  ask  you  to  make 
the  modification  as  accurately  as  possible,  and  to  raise  your  hand  as  soon  as  you  are 
through.  Because  of  the  concentration  required  for  this  task,  we  ask  you  to  make 
an  extra  effort  to  remain  quiet  so  that  others  will  not  be  distracted. 

If  there  are  any  questions,  please  ask  them  at  this  time. 
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CONTROL  STRUCTURES  ALLOHEH  IN  THE  THREE  LEVELS  OF  COMPLEXITY 


APPENDW  C 


SELECT,  UNSTRUCTURED, NO  COMMENTS 


SOTRODTIIIE  SELECT(lf,STR,EHR,Hl,H2) 

EXTEHlfAL  RBI) 

IHTECER  STR(  B)  . niX(  26)  . BCBR,  EHH.  STO,  ALPt  26) 

DATA  ALP/'IHA, IHB, IHC, IHD, IHE. IHF, IHC, IHH, IHI, IBJ. 

1 IHK.  IHL,  IHH,  IHB,  IHO,  IHP,  ina.  IHR.  IBS.  IHT,  IHU,  IHV,  IHW, 

2 IHX, IHY, lEZy 
IF(B-2S)  90.90,70 

70  ERR399 

CO  TO  300 
90  EBRsO 

1 > 0 

92  I » I + 1 

IF  ( I - 26)  95,  95,  lOO 

93  HIX(  I)»ALP< I) 

GO  TO  92 

100  1 3 0 

102  I 3 I ^ I 

IF  ( I - 25)  113,  113,  120 
115  RCHR*  HMD!  1,26,  B1,B2) 

ST03tIIX(RCHR) 

HIX(RCHH)3MIX<  I) 

HIX(  I)3ST0 
GO  TO  102 
120  I 3 0 

122  I 3 I ^ 1 

IF  (1  - B)  125,  125,  500 

123  STR(  l)3iiIX(  I) 

GO  TO  122 

300  RETURB 

EBD 

FUBCTIOB  RBDILl,  L2,  B1  ,B2) 

IBTECER  LI,  L2,  Bl,  B2 
IF  (L2-L1)  10,20,20 
10  IBDP  3 l2 

L2  - LI 
LI  3 iHDp 

20  SIZ  ■ L2  - LI  1 

R « RAB  (Bl,  B2) 

RBD  » IFIX  (R  * SIZ  + FLOAT  (LD) 

EBD 


Crl 


UNIQUE,  STRUCTURED,  GLOBAL  COMMENTS 


STTBROOTIRE  UniQUE 
PURPOSE 

TO  STRIP  ADJACENT  DUPLICATE  ITEZ3S4 

FOR  EXAMPLE.  TO  ELIMINATE  DUPLICATES  IN  A PRE-SORTED  MAILING 
LIST,  OR  HOMOGRAPHS  (WORDS  WITH  THE  SAME  SPEU.ING  BUT 
DIFFERENT  MEANINGS)  FROM  A OKTriONARY. 

USAGE 

CALL  UNIQUECARR.  N.  H) 

DESCRIPTION  OF  PARAMETERS  « 

ARR  - ARRAY  OF  ITEMS 
N - NUMBER  OF  ITEIB  IN  ARRAY 
M - MAXIMUM  LENGTH  OF  AN  ITEM 
ALTl  - BUFFER  FOR  ALTERNATE  ITEMS 
ALT2  - BUFFER  FOR  ALTERNATE  ITEMS 

11  - ITEM  NUMBER  IN  ORIGINAL  LIST 

12  - ITEM  NUMBER  IN  STRIPPED  LIST 
FUNCTION  SUBPROGRAMS  REQUIRED 

NONE 


SUBROUTINE  UNIQUE! ARR. N.M) 

INTEGER  ALTKM)  ,ALT2(M)  ,AHR(N,M) 
Il»l 

123  1 

DO  10  L»I,M 
ALT1(L)=AHR( I1,L) 

10  CONTINTO 

DO  90  LP»1,N 
DO  20  Lal.M 
ARR(  I2.L)3ALT1(L) 

20  CONTINUE 

12* I2>1 
DO  40  KTR31.N 
I1»I1+1 

IF  { II  ,CT.  N)  CO  TO  100 
DO  30  L*1,M 
ALT2(L)3ARR(  ll.L) 

39  CONTINUE 

DO  40  L=1.M 

IF  (ALTKL)  .NE.  ALT2(L))  GO  TO  50 

40  CONTINUE 

50  DO  60  L^l.M 

ARR( I2.L)3ALT2(L) 

60  CONTINUE 

123  I2-M 
DO  80  K7R3  1.N 
II«I1+1 

IF  ( II  .CT.  N)  CO  TO  100 
DO  70  L«1,M 
ALT1(L)>ARR(  11, U 
70  CONTINUE 

DO  80  L«1,M 

IF  (ALTKL)  .NE.  ALT2(L))  CO  TO  90 
80  CONTINUE 

90  CONTINUE 

100  N>12-1 

RETURN 
END 
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CHISq,  QUASI -STRUCTURED,  IN-LINE  COMMENTS 


r 


C CALCULATES  DECBEES  OF  FREEDOK  AITD  CHI-SOUABE 
C FOR  A CIVEK  CONTIRCERCY  TABLE  OF  OBSERVED  FREQUERCIES 
SUBROUTINE  CBISQ(  MAT.  N,  M,  CS , DEC,  ERR.  RTOT,  CTOT> 

INTEGER  ERR.DEG.PTR 
REAL  MAT 

DIMENSION  MAT(  100) ,RTOT(H) ,CTOT(H) 

C MAXIMUM  NUMBER  OF  CELLS  ALLOWED  IS  100  ' 

C * OF  CELLS  » * OF  ROWS  * * OF  COLUMNS 
NM»N*M 
ERR>e 
CS=0.0 

C FIND  DECREES  OF  FREEDOM 
DEC*(N-1)*(M-1) 

IF  (DEC  .CT.  0)  CO  TO  10 
C NUMBER  OF  DECREES  OF  FREEDOM  IS  ZERO 
ERR=:2 
RETURN 

C COMPUTE  TOTALS  OF  ROWS 
10  DO  20  I-l.N 

RTOTl  D^O.O 
PTR=  I-N 
DO  20  J-l.M 
PTR»PTR+N 

RTOTC I ) = RTOTl 1 ) +MAT(  PTR) 

20  CONTINUE 

C COMPUTE  TOTALS  OF  COLUMNS 
C PTR  POINTS  TO  CELL  IN  ARRAY 
PTR*0 

DO  30  J-1,H 
CTOT( J)»0.0 
00  30  I^l.N 
PTR=PTR+1 

CTOT(  J)  »CTOT(  J)  +MAT(  PTR) 

30  CONTINUE 

C COMPUTE  GRAND  TOTAL 
CT0T»0.0 
DO  40  lal.N 
CTOT*  CTOT+  RTOTl I ) 

40  CONTINUE 

C COMPUTE  CHI  SaUARE 
PTR*0 

DO  50  Jal.M 
DO  SO  I>1.N 
PTR»PTR+1 

EXPT»  RTOTl  I ) *CT0T1  J)  /CTOT 
C IS  EXPECTED  VALUE  LESS  THAN  17 
IF  lEXPT  .LT.  1.0)  EBR»1 

CS»CS-*-l  MAT!  PTR)  -EXPT)  »1  MAT!  PTR)  -EXPT)  /EXPT 
50  CONTINUE 

IF  INM  .NE.  4)  GO  TO  70 

C COMPUTE  CHI  SQUARE  FOR  2 BY  2 TABLE  1 SPECIAL  CASE) 

60  CS*CT0T4(tABSlMATl  1)  *MATl  4) -MATl  2)  *MAT13)  )-GT0T/^.0)**2 

1 /ICTOTl 1)*CT0T12)*RT0T1  1)*RT0T12)) 

70  RETURN 

END 
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